On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

نویسندگان

  • Karl Rupp
  • Barry Smith
چکیده

The application of the finite element method for the numerical solution of partial differential equations naturally leads tolarge systems of linear equations represented by a sparse system matrix A and right hand side b. These systems are commonly solved using iterative solvers, particularly Krylov subspace methods, which are typically accelerated using preconditioners to obtain good convergence rates [1]. One of the most popular families of preconditioners are incomplete LU factorization (ILU) preconditioners, where the system matrix A is factored approximately into a sparse lower-triangular matrix L and a sparse upper-triangular matrix U . Then, each application of the preconditioner to a residual vector z involves one forward-substitution Ly = z and one backward substitution Ux = y. A drawback of ILU preconditioners is the limited amount of parallelism both in the factorization and in the triangular substitutions. This complicates the efficient implementation on parallel computing architectures such as graphics processing units (GPUs) considerably. Our work is based on previous work by Li and Saad using level scheduling for ILU preconditioners on CUDA-enabled GPUs [2]. We refine their approach by considering modifications of reordering algorithms such as CuthillMcKee or Gibbs-Poole-Stockmeyer for a higher degree of parallelism and thus higher computational efficiency of ILU preconditioners. Furthermore we apply these techniques to block-ILU preconditioners and compare with the Power(q)-method by Heuveline, Lukarski, and Weiss [3]. Results obtained on GPUs from AMD and NVIDIA as well as on INTEL’s many-integrated-core (MIC) architecture will be presented. Our implementations are freely available in the open-source library ViennaCL [4], which is currently integrated into the distributed solver package PETSc [5].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Elimination ILU Preconditioners on GPUs

Iterative solvers for sparse linear systems often benefit from using preconditioners. While there are implementations for many iterative methods that leverage the computing power of accelerators, porting the latest developments in preconditioners to accelerators has been challenging. In this paper we develop a selfadaptive multi-elimination preconditioner for graphics processing units (GPUs). T...

متن کامل

Computing a block incomplete LU preconditioner as the by-product of block left-looking A-biconjugation process

In this paper, we present a block version of incomplete LU preconditioner which is computed as the by-product of block A-biconjugation process. The pivot entries of this block preconditioner are one by one or two by two blocks. The L and U factors of this block preconditioner are computed separately. The block pivot selection of this preconditioner is inherited from one of the block versions of...

متن کامل

Parallel Multilevel Block ILU Preconditioning Techniques for Large Sparse Linear Systems

We present a class of parallel preconditioning strategies built on a multilevel block incomplete LU (ILU) factorization technique to solve large sparse linear systems on distributed memory parallel computers. The preconditioners are constructed by using the concept of block independent sets. Two algorithms for constructing block independent sets of a distributed sparse matrix are proposed. We c...

متن کامل

Matrix Reordering Using Multilevel Graph Coarsening for ILU Preconditioning

Incomplete LU factorization (ILU) techniques are a well-known class of preconditioners, often used in conjunction with Krylov accelerators for the iterative solution of linear systems of equations. However, for certain problems, ILU factorizations can yield factors that are unstable, and in some cases quite dense. Reordering techniques based on permuting the matrix prior to performing the facto...

متن کامل

ILUT: A dual threshold incomplete LU factorization

In this paper we describe an Incomplete LU factorization technique based on a strategy which combines two heuristics. This ILUT factorization extends the usual ILU(0) factorization without using the concept of level of ll-in. There are two traditional ways of developing incomplete factorization preconditioners. The rst uses a symbolic factorization approach in which a level of ll is attributed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012